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Abstract 


We present role logic, a notation for describing properties 
of relational structures in shape analysis, databases, and 
knowledge bases. We construct role logic using the ideas 
of de Bruijn’s notation for lambda calculus, an encoding of 
first-order logic in lambda calculus, and a simple rule for 
implicit arguments of unary and binary predicates. 

The unrestricted version of role logic has the expressive 
power of first-order logic with transitive closure. Using a 
syntactic restriction on role logic formulas, we identify a 
natural fragment RL? of role logic. We show that the RL? 
fragment has the same expressive power as two-variable logic 
with counting C?, and is therefore decidable. 

We present a translation of an imperative language into 
the decidable fragment RL?, which allows compositional ver- 
ification of programs that manipulate relational structures. 
In addition, we show how RL? encodes boolean shape anal- 
ysis constraints and an expressive description logic. 
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1 Introduction 


Systems as relational structures. Complex systems 
arising in many areas of Computer Science can be naturally 
represented as relational structures. The state of an im- 
perative program can be specified using sets and relations 
denoted by unary and binary predicates [24, 32, 66, 8], es- 
pecially for object-oriented programs [36, 63]; a relational 
database is a finite relational structure [18, 16]; knowledge 
bases and deductive databases can also be based on predi- 
cate logic [1, 41, 53]. 

Shape analysis. Shape analysis techniques [65, 29, 33, 26, 
27, 25, 17, 40, 39, 43, 37, 55] can verify and derive precise 
properties of objects in the heap. Shape analysis is therefore 
important for reasoning about programs written in modern 
imperative programming languages. Shape analysis is also 
promising as a general-purpose verification technique, be- 
cause of its ability to reason about graphs as general struc- 
tures, and the ability to summarize properties of unbounded 
sets of objects. 

Many of the shape analysis techniques have a logical 
foundation: [65] is based on (two-valued and three-valued) 
first-order logic with transitive closure, [39, 40, 37, 55] 
is based on monadic second-order logic of trees, [26, 27] 
is based on graph grammars which are closely related to 
monadic second-order logic of trees [62]. Theorem proving 
is used in [33] to derive consequences of axioms about data 
structures. Many shape analyses perform abstract interpre- 
tation [19] to synthesize loop invariants [65, 29, 43]. 


Role logic. This paper presents role logic, a notation 
for describing properties of relational structures in shape 
analysis, databases, and knowledge bases. Role logic is an 
attempt to simultaneously achieve the simplicity of the role 
declarations of [43] with a transparent connection with the 
well-established first-order logic. 

On the one hand, the full role logic has the expres- 
sive power of first order logic with transitive closure, which 
makes it as expressive as the logic of [65, 36] and more ex- 
pressive than the original role constraints [43]. For exam- 
ple, role logic is closed under all propositional operations 
and generalizes boolean shape analysis constraints [48]. Role 
logic formulas easily translate into the traditional first-order 
logic notation. 

On the other hand, like the specialized notation for 
declaring roles in [43], role logic allows natural description of 
common properties of imperative data structures with mu- 
table references. Like dynamic logics [31] and description 
logics [1], role logic allows suppressing names of variables, 
which often leads to concise specifications. The conciseness 
of role logic makes it an appealing choice for lightweight 
annotations in a programming language. 

Another property that role logic shares with description 
logics is that an interesting subset of role logic is decid- 
able. We show the decidability of the fragment RL? of role 
logic in Section 4 by establishing a correspondence with the 
two-variable logic with counting C? [30, 57]. While many 
description logics are known to be representable in C? but 
are potentially weaker than C*, the fragment RL? of role 
logic matches precisely the expressive power of C?. 
Contributions. The following are the main contributions 
of this paper: 

1. We introduce role logic, which applies the ideas of im- 


plicit arguments and deBruijn’s lambda calculus no- 
tation to first order logic (Section 3). The result is 


a concise way of specifying properties of first-order 
structures that arise in shape analysis, databases, and 
knowledge bases. 


2. We define a variable-free subset RL? of role logic (Sec- 
tion 4). We give a translation of RL? formulas to for- 
mulas of two-variable logic with counting C?. This 
translation implies that RL? is decidable, because C? is 
decidable [30]. We further give a translation of C? for- 
mulas to RL? formulas. These two translations imply 
that RL? is just as expressive as C”. 


3. As the main application of role logic, in Section 5.1 
we present a compositional shape analysis technique. 
We introduce a unified language for writing implemen- 
tations, specifications, and conformance claims. The 
constructs of the language denote relations on program 
states expressible in the decidable fragment RL?. The 
analysis technique is based on generating verification 
conditions in RL? and applying the decision procedure 
for RL?. The analysis verifies the correctness of the dy- 
namically changing referencing relationships between 
objects by showing that procedures conform to their 
specifications. By conjoining procedure specifications 
with global invariants, the analysis can also show that 
the program preserves the key data structure consis- 
tency properties necessary for the correct execution of 
the program. 


4. We present two additional applications of role logic: 


(a) we show in Section 5.3 that a subset of role logic 
RL? naturally corresponds to an expressive de- 
scription logic [1, Chapter 5]; 


(b) we note in Section 5.2 that boolean shape analy- 
sis constraints [48], which can describe the basic 
structure of data-flow facts in [65], are a subset of 
constraints expressible in role logic. 


2 Example 


To give a flavor of role logic, we present an example that 
illustrates one aspect of a client-server manager system that 
assigns clients to servers. Figure 1 is a standard object 
model that graphically displays the system, using boxes to 
represent sets, arrows to represent relations, and intervals 
N..M to represent constraints on relations. Figure 2 de- 
scribes the same system using role logic. Figure 3 presents 
a fragment of the code of the system. The code is expressed 
in an imperative language extended with specification con- 
structs. 


WaitingClients 
server 1..1 


0..5 clients 


Figure 1: An object model for a component of client-server 
manager 


Globallnvariant = 
{Servers} A (disjoint Servers, Clients) A 
(partition Clients; WaitingClients, AssignedClients) A 
[[server = AssignedClients’ A Servers]] A 
[[clients <= ~server]] A 
[AssignedClients = card~'server] A 


<5. 
[Servers = card=°clients] 
Example consequence: 


P = [WaitingClients > 


[=(clients V server V ~clients V ~server)]] 


Figure 2: Global constraints of the client-server manager, 
expressed in role logic 


Global constraints. Figure 2 describes the global con- 
straints of a client-server manager system using a conjunc- 
tion of role logic formulas. There are two basic kinds of 
objects in the system: servers and clients. We model these 
objects using two disjoint sets Clients and Servers. The 
set Clients is further partitioned into the set AssignedClients 
of objects that have been assigned to servers, and the set 
WaitingClients that have not been assigned yet. The disjoint, 
partition, and other constructs of set algebra of sets and re- 
lations (M, U, \) are definable in role logic. 

We require the set Servers to be non-empty, which 
we denote by {Servers}, with the meaning 3a.Servers(x). 
The constraint [[server = AssignedClients’ \ Servers]] trans- 
lates to Va.Vy. server(x, y) = AssignedClients(x) A Servers(y). 
Namely, the brackets [ ] corresponds to a universal quanti- 
fier. An occurrence of a binary predicate (such as server) 
is implicitly supplied with the previous-innermost bound 
variable (here, x) and the innermost bound variable (here, 
y). The occurrence of an unary predicate Servers is sup- 
plied with the innermost bound variable (y), unless the 
unary predicate is primed, in which case the previous- 
innermost bound variable (in this case x) is supplied in- 
stead. The constraint [[clients <= ~server]] means that the 
relation clients is the inverse of the relation server. The con- 
straint [Servers > card<°clients] translates into the formula 
Va. Servers(x) = 4<°y.clients(a, y) in first-order logic with 
counting quantifiers. 

Note that all of our translations of constraints in Figure 2 
use only two variables, x and y. In fact, our entire example 
is expressed in the RL? fragment of role logic. In Section 4 
we show that RL? corresponds to the decidable fragment C? 
of two-variable first-order logic with counting, and is there- 
fore decidable. Figure 2 presents the formula P denoting the 
fact that WaitingClients objects have no incoming or outgo- 
ing edges. If we apply the decision procedure for RL?, we 
can show that Globallnvariant = P is a valid formula, which 
means that P is a logical consequence of Globallnvariant. By 
querying whether the Globallnvariant implies properties of 
interest such as P, the developers can increase their con- 
fidence in the correctness and completeness of the design. 
Moreover, our technique can be used to show the confor- 
mance of the program with respect to the design. 


proc assignClients() = 
spec old(GlobalInvariant) => !{WaitingClients} & 
[AssignedClients <=> 
old(AssignedClients | WaitingClients)] & 
GlobalInvariant 


proc assignClientsIMPL() = { 
if ({WaitingClients}) { 
cl := getWaitingClient () ; 
assignOneClientIMPL(c1) ; 
assignClientsIMPL() ; 
}} 


claim: assignClientsIMPL => assignClients 


proc assignOneClient(cl) = 
spec old(GlobalInvariant & 
[cl => WaitingClients]) => 
[WaitingClients | cl <=> old(WaitingClients)] & 
[AssignedClients <=> old(AssignedClients) | cl] & 
GlobalInvariant 


proc assignOneClientIMPL(cl) = { 
sv := getServer(); 
if (Card (sv’ & clients) <= 4) { 
WaitingClients := WaitingClients \ cl; 
AssignedClients := AssignedClients | cl; 


cl.server := sv; 
sv.clients := sv.clients | cl; 
} else { 


assignOneClientIMPL(c1) ; 
3} 


claim: assignOneClientIMPL => assignOneClient 


proc getWaitingClient() : set = 
spec {WaitingClients} => 
skip & [returned => WaitingClients] 


proc getServer() : set = 
spec {Servers} => 
skip & [returned => Servers] 


Figure 3: A fragment of a program that assigns 
WaitingClients to Servers 


Program fragment. Figure 3 shows a fragment of the 
code of the client-server manager. The top-level procedure 
in the code is a tail-recursive procedure assignClientsIMPL 
that processes all WaitingClients objects and assigns them 
to Servers objects. The assignClientsIMPL procedure ter- 
minates if there are no WaitingClients objects. Otherwise, it 
uses the getWaitingClient procedure to obtain an element 
of WaitingClients and assigns it to some Servers object us- 
ing the assignOneClient procedure, and continues with the 
next WaitingClients object using a tail-recursive call. 

The partial correctness of the procedure 
assignClientsIMPL is given using the specification 
assignClients. The requirement that the procedure 
conforms to its specification is stated using the construct 


claim: assignClientsIMPL => assignClients 


The verification of each procedure call site uses only pro- 
cedure specification (summary) instead of the body of the 
procedure, which allows verification of recursive proce- 
dures. In this example, the implementations of procedures 
getWaitingClient and getServer are not available, which 
illustrates the advantage of assume/guarantee reasoning for 
partitioning a verification task. 

Using the translation in Section 5.1, the claim constructs 
are reduced to verification conditions expressed in role logic. 
For a large class of constructs presented in Section 5.1, and 
our example in particular, the resulting verification condi- 
tions belong to the decidable RL? and can therefore be dis- 
charged using a decision procedure for RL’. 

Note that we are able to express detailed specifications 
of the correctness of procedures while remaining in the de- 
cidable logic. For example, the specification assignClients 
ensures that the entire global invariant in Figure 2 is pre- 
served, and that no client objects are lost in the assignment 
process: after assignClients, the set AssignedClients is the 
union of the old value of AssignedClients and the old value 
of WaitingClients, whereas the new value of WaitingClients is 
an empty set. 


3 <A Recipe for Role Logic 


In this section we motivate the role logic by constructing 
it in several steps. We start with first-order logic encoded 
in the simply typed lambda calculus; we then move to the 
notation that refers to each variable by its index. Finally, we 
impose a rule for implicitly supplying the indices of variables 
to predicate symbols. Later, in Section 3.6, we summarize 
the syntax and the semantics of role logic, and in Section 4 
we present a decidable sublogic of role logic. 


3.1 Lambda Calculus 


Figure 4 presents simply typed lambda calculus with explicit 
type annotations in lambda abstraction (the Church-style 
simply typed lambda calculus [5, Section 3.2]). This calculus 
is our starting point. 

As primitive types we use bool for boolean values, and 
obj for objects. As the only type constructor we use arrow 
—. We introduce rel” as a shorthand type defined by 

rel? = bool 
rel*+1 = obj — rel* 


Simple types enable us to give a simple set-theoretic seman- 
tics to formulas by interpreting lambda abstractions as total 


Form = Vars variable lookup 
Vars = {x, f,...} 
| Form Form function application 


| AVars: Type.Form function abstraction 


Syntax 


Tw) =T 
Pus? 


TF FL:7, ~To, Tre: 
T- Fi Fo: T2 


[lv = T,]/ F: To 
TE (Av: T1.F): Ti — To 


Types 
[v]Je = ev 
[A Fale = (Lie) (le) 
PAu: T.F]e = Ad.[F] (elv := d)) 
Semantics 


Figure 4: Church-style Simply Typed Lambda Calculus 


functions. The resulting semantics is in Figure 4; the seman- 
tics is straightforward because we use lambda calculus itself 
as our meta-notation. 


3.2. De Bruijn Notation 


An alternative to referring to each bound variable by its 
name is to refer to each variable by its number, with number 
1 denoting the most recently bound variable. This is the 
idea behind de Bruijn indices for lambda calculus [22, 4]. 
Figure 5 presents the syntax and the semantics of lambda 
calculus notation with de Bruijn indices. The environment 
maps the keyword stack to a stack (i.e., a list) of elements 
of the domain. If h is an element and / a list, then the 
notation h : 1 denotes the list with the head h and the tail 
l. The abstraction pushes a value onto the stack; the index 
(k) retrieves the k-th element from the top of the stack. 


3.3 Predicate Logic in Lambda Calculus 


We next encode first-order logic with equality in lambda 
calculus. We use EQ to denote the binary equality relation. 
We assume that the interpretation of relation symbols is 
specified in the environment e. We introduce conjunction 
and negation as logical operations acting on booleans (the 
remaining propositional operations are defined in terms of 
A,7, as usual). We use the abstraction in lambda calculus 
to encode bound variables of predicate calculus. This is 
the usual higher-order logic encoding of classical first-order 
logic, as used, for example, in Isabelle interactive theorem 
prover [58]. Figure 6 presents this encoding of quantifiers. 


variable lookup 


Form = (Nat) Nat = {1,2,...} 
| Form Form function application 
| A:Type.Form function abstraction 
Syntax 
[(i)]e = getie 
[Fi Pole = ([File) ([Fale) 
[A:T.F]le = Ad. [F] (pushde) 
Semantics 
getie =  nthi (e stack) 
pushde = e|stack :=d: (estack)| 
nthl(h:l) = h 
nth(i+1)(h:l) = nthil 


Auxiliary Functions 


Figure 5: De Bruijn Form of Simply Typed Lambda Calcu- 
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Figure 6: First-Order Logic in Lambda Calculus 


{F} 
[F] 


V(X :0bj.F) 
{=F} 


Quantifier Brackets 


then write r 
instead of r(k)(k—1)... (1) 


Default Argument Rule 


When I(r) = rel* 


~F = (\AF)(1)(2) 
F’ = (A\AF)(2)(2) 
card=*F = {* (\F)(IDA...A(AF)(K)A 
Nicicj<k 7EQ(i) (J) y* 
card-*F = card=*F \ -card2**! F 
(0, Card Fj) >k = 7 Vo Ae ycatde™ re; 
(OO, Card Fi) =k = Vaal Es card: 
So, ki=k 
disjoint Fi,...,Fn = [| A -A(Fi AF) 
1<i<j<n 
partition F; F\,...,Fn = — disjoint Fi,..., Fn A 
[Fe Vi-. Fl 
Pi \Fo = FAa7AF2 
Shorthands 


Figure 7: de Bruijn form of Predicate Calculus 


To remain within first-order logic, we require the quantifier 
3 to have monomorphic type (obj — bool) — bool (see also 
Section 3.7). 


3.4 Implicit De Bruijn Indices 


Figure 7 shows how we combine the encoding of first-order 
logic in higher-order logic and de Bruijn’s notation for 
lambda calculus. 


Example 1 First-order predicate calculus formula 
VaVy. f(x,y) > A(x) A Bly) 
can be written in this notation as 
[[f(2)(1) = A(2) A B()]] 


The outermost [] bracket acts as the quantifier Vx; the vari- 
able x is referred to inside the formula as (2) because it is the 
second innermost bound variable. The innermost | ] bracket 
acts as Vy; the variable y is referred to as (1). 


4 


The interpretation environment e contains both the stack for 
de Bruijn indices and the bindings of relation symbols such 


as A and f in Example 1. Relation symbols of predicate logic 
correspond to variables of type rel*. We use the abstraction 
over de Bruijn indices \:T.F' only when T = obj, and write 
this abstraction simply AF’. For every environment e, the 
value (estack) is a list of elements of type obj. 

We next introduce the Default Argument Rule: we omit 
de Bruijn indices from the expression r(k)(k—1)... (1) when 
r is a relation symbol, that is, when I(r) = rel*. We in- 


terpret every occurrence of variable r when I'(r) = rel* as 
r(k)(k—-1) ... (1). 


Example 2 The Default Argument Rule means that in- 


stead of 
[[f(2) (1) = A(2) A B(1)]] 


we write 


4 


We lose no expressive power by the Default Argument Rule. 
For example, if we wish to denote r(i3) (i2) (#1), we write 
(AAA) (i3) (42) (i1). Note that the Default Argument Rule 
applies only to the relation symbols, not to all subformulas, 
so (AAAr) with Default Argument rule is equivalent to r 
without Default Argument Rule. In general, if r is an n-ary 
relation, we write ((A)*r) (iz) (i,_1)... (i1) where we would 
previously write r(ix)(in,)... (i1). 


3.5 Shorthands 


Figure 7 introduces some shorthands. Tilde ~ swaps two 
topmost stack elements (1) and (2). Prime ’ replaces the 
top (1) with the element (2). An expression card=*F, for an 
integer k > 0, corresponds to a counting quantifier in first- 
order logic [30]. A counting quantifier states that the num- 
ber of elements with some property is greater than or equal 
to k. Figure 7 also introduces the shorthand for card=* F 
and the shorthand Card for specifying a constraint on a sum 
of cardinalities. The shorthands containing < are defined 
similarly. 

These shorthands play two purposes. On the one hand 
they allow expressing certain properties in a more concise 
way. On the other hand, if we use the shorthands but give up 
the ability to refer to indices explicitly, we obtain a fragment 
of first-order logic that is equivalent to two-variable first- 
order logic with counting (Section 4) and therefore decidable 
[30]. 


Example 3 Using the shorthands, we write the formula 
Vavy. f(x,y) = A(x) A By) 


[[f > A’ A Bl] 


The convenience of role logic is even more evident in larger 
formulas like 


Va. A(x) => (Vy.f (x,y) > Bly) VCly)) A 
(Vz.g(x,z) > D(z)) 


which can be written as 


[A> [f>BVCAlg=> D]] (1) 


Fx = rtrancl (AAF) (2) (1) 
[rtrancl] ray = An> 0.520,.-..,2n- 20 =LA%n =yA 
n-1 
ixo "i 241 
FroFh = {(AAFi)(3) (1) A (AAF{)(2)(1)} 
Ft = FoFx 
acyclic F = -~{F+A EQ} 
tree Fi,..., Fn = acyclic VL, Fi 


A [(Vi1 Fi)* => 
yo, Card (~Fi) < 1] 


Figure 8: Transitive Closure Construct and Shorthands 


Formulas of form (1) are useful for describing properties of 
first order structures that arise in shape analysis, see e.g. 
[48, 47, 71]. 


4 


For additional expressive power we introduce the 
reflexive-transitive closure operator *, with the semantics in 
Figure 8. We also introduce a shorthand for relation com- 
position. The relation composition shorthand works when 
F and F2 both denote binary relations, when the resulting 
expression can be thought of as denoting a binary relation, 
as well as when F) denotes a set and F2 denotes a binary re- 
lation, when the resulting expression denotes the set which 
is the image of F under F. For the case of relation we also 
introduce a simpler definition in Figure 13 whose advantage 
is that it uses only two implicit indices. 


3.6 Role Logic 


Figure 9 summarizes the syntax of role logic. The semantics 
of role logic follows from Section 3. 

We next explain the purpose of lambda abstraction in 
our logic. 


3.7 Lambda Calculus for Predicate Definitions 


In the resulting role logic of Figure 9 we retain the named 
variables in the environment, and we allow abstraction over 
those named variables. As a result, there two kinds of 
lambda abstraction: abstraction over de Bruijn indices and 
abstraction over named variables. Abstraction over a de 
Bruijn index is always over (1) which denotes an object 
of type obj, such abstraction is written AF’. The abstrac- 
tion over a named variable may abstract over variables of 
more complex types and is written Ax: T.F’. There is only 
one kind of lambda calculus application; both (AF) F2 and 
(Ax : T.F\)F> are redexes. 

The purpose of the named lambda abstraction Ax : T.F 
is twofold. First, when T = obj, then we can write A(Az : 
obj.f’) as da.F' as in the usual first-order predicate calculus. 
Second, when T is not obj, we can encode acyclic definitions 
of higher-order predicates that can be subsequently substi- 
tuted away. Define the expression 


let P: T = F, in Fy 


Form = _ Vars 

(Nat) 

EQ 

Form A Form 


—=Form 


AForm 

AForm 

AVars : Type . Form 
Form Form 

Form’ 

~Form 


card="Form 


Form 


named object or predicate 


de Bruijn index of an object variable 


equality between (1) and (2) 
conjunction 


negation 


existential quantification over objects 
de Bruijn abstraction over objects 


abstraction over named variables 


function application 

let (1) be (2) in F 
relation inverse 

at least k objects satisfy F 


reflexive transitive closure 


Figure 9: The Syntax of Role Logic 


to be equivalent to 
(AP :T . F2)Fi 


Such definitions are very useful for describing complex data 
structures. 

Note that acyclic definitions introduced through typed 
lambda calculus via bindings Ax : T.F for T 4 bool do 
not make the logic higher-order, because we define the the 
quantifier 4 to always have the monomorphic type (obj — 
bool) — bool, and the reflexive-transitive closure operator * 
to have the type 


(obj — obj > bool) — (obj — obj > bool) 


Consider a well-typed formula F whose only free variables 
are relation symbols, and whose de Bruijn indices only re- 
fer to indices bound in the formula. Assume that we have 
applied the Default Argument Rule, so that all de Bruijn in- 
dices are explicit. Then we may treat de Bruijn abstraction 
as the usual abstraction over a disjoint set of variables. By 
strong normalization of simply typed lambda calculus [5], 
let F° be the normal form of F. We claim that in F° the 
only occurrence of lambda abstraction is within expressions 
of the form A(Ax : obj.F’) or rtrancl(Ax : obj.Ay : obj.F). 

To show the claim, consider an occurrence of Ax : obj. Fo 
in F°. Let F, be the largest enclosing occurrence Ax, : 
Ty... ALn : Ty.Ax : Obj.fo. Then F; cannot be the entire 
F°, because F° has type bool by subject reduction. F; 
cannot occur within some application F\F2, because F)F2 
would constitute a redex and F° is in normal form. Hence, 
F can only occur in an expression of the form F3F\. Let 
us consider the “spine” [38] of F3F1, so F3 = FnFn-1... Fi 
n > 3 and F,, is not an application. F;, is not an abstraction, 
because F° is in normal form. Hence, F;, can only be a 
variable or a constant. 

The only variables or or constants that can, by the typing 
rules, be applied to an abstraction F) are J and rtrancl, so 
either F,, = 3 or Fy, = rtrancl. 

Consider the case F,, = 4. By the type of 4, we conclude 
F3 = F, and F, = Ax: obj.Fo, as desired. 

Consider the case F, = rtrancl. Then F3 = Fy, and 
F, = du: obj.Av : obj.G, so either u = x and F, = Ax.obj.Fo 


where Fo = Av : obj.G, or v = x and F, = Xu: obj.Ar : 
obj.Fo. This finishes the proof of the claim. 

We conclude that typed lambda calculus allows us to use 
flexible definitions of higher-order predicates to structure 
our specifications while keeping the language first-order, be- 
cause we may substitute away all definitions using strong 
normalization of the typed lambda calculus. 


4 Role Logic Subset RL? and its Decidability 


In this section we introduce a subset RL? of role logic (Fig- 
ure 11) and show its decidability. 

To show the decidability of RL?, we give translations of 
formulas between the following four logics: 


1. D?: the formulas of the first-order logic with count- 
ing in which every subformula has at most two free 
variables (different subformulas may have different free 
variables); 


2. C®: the formulas of the two-variable logic with count- 
ing, which uses x and y as the only variable names; the 
satisfiability and finite satisfiability problem for C? was 
shown to be decidable in [30]; the satisfiability problem 
for C? was shown NEXPTIME-complete in [57]; 


3. I?: de Bruijn version of the two-variable logic with 
counting, which uses only de Bruijn indices (1) and 


(2); 


4. RL?: a subset of role logic that contains no explicit de 
Bruijn indices. 


Figure 10 sketches the idea of the proof of equivalence of 
these four logics. We give translations of formulas from D? 
to C? (Section 4.2, Figure 15) from C? to I? (Section 4.3, 
Figure 18), from J? to RL? (Section 4.3, Figure 19) and 
from RL? to D? (Section 4.4, Figure 20). These translations 
imply that the satisfiability problem for these four logics are 
equivalent, so by decidability of C? [30] we conclude that all 
these logics are decidable. 


Fig. 15 
2 C2 
Fig. 20 Fig. 18 
Fig. 19 
RL? ~—— _[ 


Figure 10: Showing Equivalence of Four Logics. 


Form = Vars binary or unary relation symbol 


EQ equality between (1) and (2) 


Form A Form conjunction 
—=Form negation 
Form’ let (1) be (2) in F 
~Form relation inverse 
card=*Form at least k objects satisfy F 


Figure 11: The Syntax of RL? Subset of Role Logic 


Nato = {1,2} 
e : Nate — obj 

[Je = [Alle1) 

[fle = [l(e2,e1) 

[EQle = (e2)=(e1) 
[A Fije = ([File) A (Lele) 

[Fle = -([Fle) 

[Fle = [FV(ell > (2))) 

[~Fle = [Fl(cll > (€2),24 (e1))) 
[card=*F]e = |f{o| [F](e[l + 0,2 (el)])}/>k 


Figure 12: The Semantics of RL? 


quantifiers: 


{F} 
[F] 


card2!F 


relation image: 
FatFy = {Fa A~F,} 
weakest precondition: 


wef, F4 = [F, > Fa] 


Figure 13: Some Shorthands for RL? 


4.1 The Role Logic Subset RL? 


Figure 11 presents the two-variable role logic RL?. Com- 
pared to the full role logic in Figure 9, RL? omits the con- 
structs for creating definitions, the constructs for explicitly 
referring to object variables, and transitive closure. Fig- 
ure 12 summarizes the semantics of RL?; this semantics is in 
accordance with the semantics of the full role logic derived 
in Section 3. Figure 13 defines shorthands that illustrate 
some constructs definable in RL?. 

We show that RL? has precisely the same expressive 
power as the set of the formulas of logic C?, which is shown 
decidable in [30] over the set of all models, as well as over 
the set of finite models. 


4.2 Two-Variable Logics C? and D? 


Figure 14 presents the logic C? [30]. The logic C? is first- 
order logic with equality and counting, restricted to formulas 
that contain only two fixed variable names x and y. 

In this section we argue that a more flexible restriction 
on variable names yields logic with same definable relations. 
Let FV(F’) denote the free variables of formula F’. 


Definition 4 A D? formula is a formula F of first-order 
logic with counting such that |FV(G)| < 2 for every subfor- 
mula G of F. 


Clearly every C? formula is a D? formula, but not vice 
versa, because the set of possible variables that may occur in 
D? formulas is countably infinite. The syntactic restriction 
on variables in Definition 4 is more general than in the def- 
inition in C”, which makes D? more convenient for writing 
readable formulas. 

We show that every D? formula is equivalent to a C? 
formula (modulo the renaming of free variables). Up to one 
technical detail, it suffices to rename bound variables in a 
D? formula to obtain a C? formula. We therefore derive the 
equivalence of D? and C? as a consequence of an observation 
about lambda calculus terms. 


Definition 5 Define the set of lambda calculus terms 
2VarTerms as the smallest set that satisfies the following con- 
ditions: 


1. v € 2VarTerms if v is a variable and c € 2VarTerms if c 
as a constant; 


2. if T1,T2 € 2VarTerms and |FV(T1) U FV(Z2)| < 2, then 
(T1T2) € 2VarTerms; 


Varso = {x,y} 

Form = A(Varse) 
f (Varse, Varse ) 
Varse = Varse 


| 

| 

| Form A Form 
| —Form 

| 


52 Vars.. Form 


atomic formula with unary relation A 


atomic formula with binary relation f 


equality between objects 
conjunction 


negation 


at least k objects satisfy formula 


Figure 14: The Syntax of Two-Variable Logic with Counting C? 


3. if T € 2VarTerms, v is a variable, and |FV(T) U {v}| < 
2, then Av.T © 2VarTerms. 


From Definition 5 it follows that if T € 2VarTerms, then 
|FV(T1)| < 2 for every subterm T; of T. Moreover, if Av.T € 
2VarTerms and v ¢ FV(T), then |FV(T)| < 1. 

We next define the set capt(v, F’) of those bound variables 
z in formula F’ such that v occurs in the scope of a binding 
of z. 


Definition 6 
capt(v, u) 
capt(v, Fi Fo) 


0, if u is a variable 
capt(v, F,) U capt(v, F2) 


capt(v, F)U{u}, ifve FV(Au.F) 


EAPO ee) 0 otherwise 


As usual, we say that T and T”’ are a-equivalent if T’ can 
be obtained from T’ by renaming bound variables. 


Lemma 7 For every T € 2VarTerms with FV(T) C {u,v} 
there exists aterm T’ = norm(T) such that T’ is a-equivalent 
to T, all bound variables in T’ are among {x,y}, and either 


1. capt(u,T’) C {x} and capt(v,T’) C {y}, or 
2. capt(u,T’) C {y} and capt(v, T’) C {zx}. 


Proof. Let FV(T) C {u,v}. Without loss of generality 
we may assume that {u,v}M {x,y} = @. The proof is by 
induction on the structure of terms. 


1. T = wu for a variable u. Let T = T”, clearly 


capt(u, T’) = capt(v, 7”) = 0. 


2. T= 7T;T2. Let Tj = norm(T\) and Tz = norm(T2) by 
induction hypothesis. Assume capt(u,77) C {2} and 
capt(v, 71) C {y} (the other case is symmetric). We 
consider two cases for T3. 


(a) capt(u, 73) C {x} and capt(v, 73) C {y}. Then 
let norm(T) = T{T3. 

(b) capt(u, 7%) C {y} and capt(v, 7) C {2}. Let Ty 
be the result of swapping in T3 all occurrences of 
bound variables x and y. Then capt(u, T3’) C {x} 
and capt(v, Ts’) C {y}, so we let norm(T) = T{T’. 


In both cases, capt(u,norm(T)) C {a} and 
capt(v, norm(T)) C {y}. 


3. T = Aw.Ti. |{u,v}| = 2 and |FV(Z1) U {w}] < 2 
by the definition of 2VarTerms, so it cannot be the 
case that both wu € FV(Zi) and v € FV(Z1). Since 
FV(T1) C {u, v, w}, we conclude that FV(T1) C {u, w} 
or FV(T1) € {v, w}. 


Suppose therefore that FV(Ti) C {u,w} (the case 
FV(T1) C {v, w} is symmetric). By induction hypoth- 
esis, let Tj = norm(T\). Assume capt(u,T1) C {ax} 
and capt(w,7i1) C {y} (the case capt(u,Ti1) C {y} 
and capt(w,71) C {x} is symmetric). Let norm(T) = 
Ax.(Fi[w := a]). Then capt(u, norm(7’)) C {x} and 
capt(v, norm(T)) = @ C {y}. 


To apply Lemma 7 to D? formulas, we represent all log- 
ical operations and quantifiers as constants. Variables in a 
lambda term then correspond to first-order variables. To 
ensure that the representation of formulas satisfies the con- 
dition |FV(T) U {v}| < 2 for each term Av.T, we require the 
following condition: 


For every formula 47*z. F, (2) 
either x € FV(F) or F = true. 


We ensure this condition by applying the rule 


aA*e FP OY FP AA=*z. true 


for « ¢ FV(F). 

After ensuring the condition (2), we apply the transla- 
tion in Figure 15. Lemma 7 justifies the correctness of the 
translation. The translated formula is of the same size as the 
original formula. The translation can clearly be performed 
in polynomial time, including the process of ensuring the 
condition (2). The translation time can be made close to lin- 
ear by delaying the application of the substitution [w := 2] 
and the swap operation. 


4.3 From C’ to RL? via /? 


In this section we introduce logic I? (Figure 16). We then 
give translations from C? to I? (Figure 18), and from J? to 
RL? (Figure 19). 


Toc[A(v)] = Av) 
Toclf(u,v)] = flu») 


Tpo[-F] = “Toc |[F] 
Fi A F3, if capt(u, F{), capt(u, F3) C {x} 
capt(v, F{), capt(v, F3) C {y} 
or 
Too|[Fi N\ Fo] = capt(u, Fi), capt(u, F3) c {y} 
capt(v, F{), capt(v, Fz) C {x} 


F{ A (swap F3), otherwise 


FV(Fi A F2) = {u,v} 


Fi = Toc [Fi] 

F3 = Toc [2] 

swap (A(v)) = A(su,sv) 
swap(f(u,v)) = f(su,sv) 

swap (7F’) = -—(swap F’) 

swap (Fi A F2) = swap fi A swap F> 
swap (47*v. F) = J=*(sv). (swap F) 


SL=Y, Ssy=x 
su=u, ifu¢ {z, y} 


52% x. (F’[w:= ax]), if capt(u, F’) C {x}, capt(w, F’) C fy} 
52*y.(F"[w:=y]), if capt(u, F’) C {y}, capt(w, F’) C {x} 


Toc [a= *w. FY — 


FV(F) C {u, w} 
F' =Tpc [F] 


Figure 15: Translation of D? formulas to C? formulas. 


Form = A((Natz2)) atomic formula with unary relation A 


f((Nat2),(Nat2)) atomic formula with binary relation f 


| 

| (Nate) = (Varse) equality between objects 
| Form A Form conjunction 
| —Form negation 
| card=*Form at least k objects satisfy formula 


Figure 16: The Syntax of Intermediate Logic I? 
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e 3: Nate — Varse 


TiclA(i)le = Aled) 
Trel[f((%1), 2) le = flet,et2) 
Tro[(t1)=(é2)Je (et1) = (et2) 
TiclFi A Fojle = (Tio[Fije) A (Tire [FeJe) 
Tic[-Fle = ~(cl Fe) 
Tic|card2* Fle = 4@*v. (Tro[F][1 & v,2 6 (e1))) 
v = s(e1) 
SL=Y, Ssy=uz 


correctness criterion: 


[Zrc[F elec = [F](ec 0 €) 


Figure 17: Translating I? formulas to C® formulas 


Intermediate logic. Figure 16 presents logic I”. I? is 
a version of C? that uses two de Bruijn indices instead of 
variables. We introduce I? to separate the the translation of 
C? formulas to RL? in two phases: the first phase introduces 
de Bruijn indices, and the second phase introduces Default 
Argument Rule. 

For the sake of illustration, we first present a converse 
translation, from I? to C?, although we do not need this 
translation to show the equivalence of D?, C?, I?, and RL?. 


From J? to C’. Figure 17 presents the translation of J? 
into C®. This translation amounts to introducing alterna- 
tively variables x and y for each counting quantifier, and 
resolving the indices appropriately. Using the criterion in 
Figure 17, the correctness of the translation follows by in- 
duction on the structure of formulas. 


From C? to [?. We turn to the translation from C? to 
I’. Consider the C? formula 


F= 21 y, (3242. (2 


‘x. P(x, y)) A Q(x, y)) 


The subformula P(x, y) of F refers to the variable y, which 
is the 3rd bound variable starting from the innermost one. 
Therefore, the straightforward replacement of variables by 
de Bruijn indices would require the access to (3). To ad- 
dress this problem, the translation from C? to I? involves 
a preparatory “alternating transformation” on C® formulas. 
For every formula F’, let B(F’) denote some purely proposi- 
tional combination of F’ and perhaps some other formulas. 
The alternating transformation eliminates all subformulas 
of the form 32*!v. B(4=*2v. G(v)) for v € Varsg. In the re- 
sulting formula, the sequence of bound variables along any 
path in the formula tree is alternating, that is, satisfies the 
regular expression (y|e)(xy)*(a]e). 

For the purpose of alternating transformation, we add 
the disjunction V to the language. We show how to eliminate 
successive quantification over « from 3="ta. B(4=*22.G) 
(the case of 4**1y. B(A**2y. G) is analogous). First, trans- 
form B into disjunction of canonical conjunctions of for- 
mulas H, where each H satisfies one of the following three 
conditions: 
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e 3: Varse — Nate 

A((ev)) 

f((e%1), (e42)) 

(evi) = (e%2) 

(Zo1[File) A (Zor[F2]e) 
7(Zc7[F Je) 

card=*(Tor[F][x > 1,y - 2]) 
invariant: ey = 1 
card=*(Tor[F][y 1,2 + 2]) 


ex=1 


Te1[A(v) 
Tor[f (v1, v2) 
Torfvi=vzje = 
Tor/fFi A Feje = 
Tor|-Fle = 


Tor)a2*c. Fle = 


© 
II 


o 
II 


aT 


Tor yak y. F 


invariant: 


correctness criterion: 


[Zor[F ]eJer = [F](er oe) 


Figure 18: Translating normalized C® formulas to I? formu- 
las 


C1) H is quantifier-free; 


C2) H is of the form 3**v. G(v) for v € Varso; 


C2) H is of the form —3*v. G(v) for v € Vars2; 


Let B= Vig B; where each B; is a canonical conjunction 
(cube) of formulas satisfying conditions C1), C2), C3). Be- 
cause B; A B; is contradictory for distinct cubes B; and B;, 
the sets of objects o satisfying different B; are disjoint, so 


I{o | [B]e[v — o]}| = ey I{o | [BiJelv — o]}| 


We can therefore replace counting quantifier on B with a 
propositional combination of counting quantifiers on B; for 
1 <i<n (as in quantifier elimination for boolean algebras, 
[67], [49, Section 3.2]). Specifically, 


V [Pa 8; 


SS i 
4a-"¢7.B & 


(3) 


It is therefore sufficient to eliminate the successive quantifi- 
cation over # in 42"! a. B;(4**2x.G). Group the conjuncts 
in B; as follows. Let FV(F’) denote free variables of formula 
F. Let P(x) be the conjunction of conjuncts C of B; such 
that x € FV(C), and let Q be the conjunction of all con- 
juncts C of B; such that x ¢ FV(C). All occurrences of 
4227. G in B; are in Q. We have 


WwW 


219 B, = 32 2.QA P(x) = QAAA* x. P(x) 


where the last equivalence follows easily by definition of 
the counting quantifier 3=°k,. In the resulting formula 
QA 42*ki. P(x), the subformula J7*?2.G is in Q and is 
therefore not in the scope of the original quantifier. By re- 
peating this transformation we ensure that all quantifiers 
are alternating. 


Trr[A((1))] = A 
Trr[A((2))] = A’ 
Tralf((2),))] = fF 
Tralf((l), (2))] = ~f 
Tralf((2),(2))) = f 
Trrlf((),())) = ~(F') 
Trrl(2) = (1)] = EQ 
Trrl(1) = (2)] = EQ 
Trr[(1) = (1)] = true 
Trr[(2) = (2)] = true 
Trrk[Fi A Fo] = Tre[ Fi] A Tre[ Fe] 
Tir[-F] = ~TiealF] 
Trr[card=*F] = card?*T;p[F] 
correctness criterion: 
(TiefFller = (Fle: 


Figure 19: Translating I? formulas to RL? formulas 


After the alternating transformation, the translation 
from C? to I? is straightforward, and is presented in Fig- 
ure 18. The correctness of the translation follows by in- 
duction of the structure of formulas. The translation in 
Figure 18 runs in linear time and produces an J* formula 
whose size is linear in the size of the original C? formula. 

The alternating transformation that precedes the trans- 
lation may cause exponential blowup of the formula size due 
to translation to disjunctive normal form, but for most for- 
mulas the transformation need not be applied. Moreover, 
if we allow introducing new predicate names, then we may 
replace 3*"1x. B(A**22. G(a,y)) with 4**!a2. B(P(y)) and 
conjoin the topmost formula with the formula Vy.P(y) <=> 
42"29.G x,y). Such transformation can be performed in 
linear time and preserves the satisfiability of formulas (see 
[30, Section 2.1, Page 18] and [30, Lemma 2.3]). 


From J? to RL’. Figure 19 presents the translation from 
I’ to RL”, which is simple and does not require a translation 
environment. The translation algorithm runs in linear time 
and produces a RL? formula whose size is linear in the size 
of the original I? formula. 


4.4 From RL? to D?: Closing the Loop 


In the final step, we provide a translation from RL? formulas 
to D? formulas. The logic D? is a convenient target of trans- 
lation of RL? formulas. (Namely, a simple attempt at trans- 
lation from RL? to I? runs into the difficulty of the following 
form. Formula (card='f)’ is equivalent to card=' f((3), (1)) 
which uses index (3) not available in J?. Similarly, an at- 
tempt to translate from RL? to C? runs into difficulty of 
variable capture.) 

Figure 20 presents the translation from RL? to D?. The 
correctness of the translation follows by induction on the 
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eQ0 € Nat 
ek € {y1,y2,-.-} for k € {1,2} 
Trp[AJe = A(el) 
Tro[fle = f(e2,e1) 
Trpo[EQ]e = (e2) =(el) 
Tro[Fi A Fale = (Tro[ File) A (Tro[F2Je) 
Tro[-Fle = 7(Tro[F]e) 
Trp[card** Fle = 4#*v.[F]e[(Or n, 1H 0,2 (el) 
ete e0 
Tro[~Fle = Tro[F](ell — (e2),2+ (e1))) 
TaolF'le = TrolF\(ell > (e2))) 


correctness criterion: 
[Zro|F]elec = [F](ec oe) 
result is in D?: 


FV(Trp|Fle) C {el,e2} 


Figure 20: Translating RL? formulas to D? formulas. 


structure of formulas. Furthermore, each subformula G of 
a formula Trp[F]e is of the form G; = Trp[G]e1 for some G 
and 11, and by induction it follows that the free variables of 
Tro[G]e1 are among {e; 1,e1 2}. Therefore, |FV(G1)| < 2 
and the result of translation is a D? formula. 


Summary As indicated in Figure 10, we have presented 
translations from D? to C?, from C? to I”, from I? to RL?, 
and from RL? to D?. We conclude that D?, C?, I?, and RL? 
are all equivalent logics, and, by [30], decidable. 

The satisfiability problem for C? formulas is shown to be 
NEXPTIME-complete in [57]. We have observed that there 
are efficient polynomial transformations of formulas from D? 
to C, from C® to I?, from I? to RL? and from RL? to D? 
that yield formulas equivalent for satisfiability. (Moreover, 
all transformations except from C? to I? yield equivalent 
formulas in the same vocabulary.) As a result, the satisfia- 
bility problem of all these logics is NEXPTIME-complete. 


5 Applications of Role Logic 


We next present three applications of role logic. In Sec- 
tion 5.1 we present a shape analysis technique based on 
generating verification conditions in RL? and applying the 
decision procedure for RL?. In Section 5.2 we note that 
boolean shape analysis constraints [48] are a subset of con- 
straints expressible in role logic. In Section 5.3 we show 
that a different subset of RL? corresponds to an expressive 
description logic [1, Chapter 5]. 


5.1 Static Analysis Based on RL? 


This section shows how to use the decidability of RL? for 
static analysis of imperative programs. Figure 21 presents 


the syntax of a simple imperative language. Figure 22 
presents predicates in RL? that describe the meaning of 
statements in this language. 


Program state. The state of the program is a first-order 
structure interpreting the language L = AUF where Aisa 
finite set of unary predicates and F is a finite set of binary 
predicates. We fix a countable universe of objects obj, and 
assume that each structure has the same universe obj. To 
specify the structure, it suffices to give the set eA C obj for 
each unary predicate A € A, and a binary relation ef C 
obj x obj for each binary predicate f € F. 


Extended language. For each k € {e,0,1,...} we define 
the language L(,). We identify L(.) with L, Aj.) with A 
and f(-) with f. For k € {0,1,...}, we let A(,) be a fresh 
unary predicate symbol, and f(,) a fresh binary predicate 
symbol, and L,) be the set of all A(,) and f(x). The notation 
formRen (i > 7) F' for i, 7 € {e,0,1,2...} denotes a formula 
resulting from F’ by replacing all elements of L(;) with the 
corresponding elements of L,;). 


Describing relations in the extended language. The 
meaning of each statement in our imperative language is a 
binary relation on L-structures. We describe a binary re- 
lation on structures with an RL? formula in the language 
Lo) UL(.). The predicates in L(-) denote the state compo- 
nents in the final state; the predicates in Lo) denote the 
state components in the initial state. If F is a formula 
in language L,.), then F is a shorthand for the formula 
formRen (€ — 0) F in the language Lo); the purpose of F is 
to denote the value of the formula F’ evaluated in the initial 
state. 

Define the renaming operator strucRen (i — 7) such that 
if e() is an Li;)-structure, then e(;) = strucRen (i > 7) e(;) 
is an L,,)-structure such that e(;)A(Qj) = e() AG) and 
eg) fg) = e@ fq for all A,f ¢ L. Then the relation 
on L-structures denoted by an RL? formula F in language 
Lo) U Ley is {(e, e’) | [F]((strucRen (e > 0) e) U e’)}. 
Assignment statements. The imperative language in 
Figure 22 contains three forms of assignment statements. 

The statement A:= F evaluates to the formula F’, which 
denotes a unary predicate. The statement makes A true 
precisely for those object for which F was true in the ini- 
tial state. Unary predicates other than A as well as binary 
predicates remain unchanged. 

The statement F).f:=F2 generalizes the statement 
x.f = y in a language like Java by allowing simultaneous 
modification of fields of a set of objects. Formula F\ spec- 
ifies the set of objects whose fields are modified. Formula 
Fy specifies the new value of the field f for objects in F. 
Unary predicates and binary predicates other than f remain 
unchanged. Note that F2 may specify a relation, which is 
particularly interesting when F, denotes a set with more 
then one element because it allows the value of the field to 
depend on the source object of the field. As a special case, 
F.f := 4g copies the entire field g into field f for all objects 
in the set given by F\, and, in particular, true. f :=g copies 
the field g into f. The statement F\.~f:= F> is dual to 
F.f := F2, and updates the inverse of the predicate f. 
Statements for specification. The statement assume F’ 
filters out the state transitions for which F’ does not hold 
in the initial state. The statement assert F behaves arbi- 
trarily if the condition given by F’ does not hold in the ini- 
tial state. The state contains an additional predicate Error, 
which makes it easier to detect that an arbitrary behavior 
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[Pi Po] = ([Si]A 
a[S2](Bi © Ai,...,Bnt> An)) 
is not satisfiable, where: 
Pi(Ai,...,An) = S1 
P2(Bi,..., Bn) = Se 
[S2] has no fresh predicates 
[A:=F A F] A modUnary A 
[Fi.f = Fe F, = [f Fal] A 
oF => [f <= Fi] A 
modBinary f 
[Fiwf:=Fo Fi = [xf Fil] A 
oA > [~f — fl) A 
modBinary f 
[P(A.,...,F.)] = [S](A1e6 F,..., An Fr) 
where P(Aj,...,An) =S 
[assume F] = FAskip 
[assert F] = F = skip 
[specF] = [F] 
[s1 Ase] = [si] A [s2] 
[s1 V se] = [si] V [se] 
[s1;s2] = formRen(e — &) [si] A 
([7Error] = formRen (0 — k) [s2]) 
k, — fresh element of {1,2,...} 
[modify E] = MIE] 
modUnaryA = Api,l[B = BI] A 
Allg ==> all A 
[Error <> Error] 
modBinaryf = /A,[B => BIA 
Ngzsllg9 == GI] A 
[Error <> Error] 
skip = A,p[B <= BIA 
Agllo <> aA 


[Error <=> Error] 


Figure 22: Predicates Describing the Semantics of the Lan- 
guage from Figure 


procedure 
refinement 
unaryList 


stat 


asgnStat 


Fp 


paramList 
items 


modltem 


a role logic formula 
unary predicate 

binary predicate 
procName(unaryList) = stat 
procName = procName 
A | unaryList, A 

asgnStat 
procName(paramList) 
assume F” 

assert F’ 

spec Fz 

stat V stat 

stat A stat 

stat; stat 

A:=F 

F\.f := Fe 

Py.wf := Fo 
A|FIEQ/AAAR|-F 
F'|\F | card" F 
asgnStat | modify items | procName(paramList) 
F | paramList, F 


modltem | items, modltem 


A:<=F 
F,.f :<= Fo 
Fi.nf <= Fo 


assignment statement 
procedure call 

assume statement 

assert statement 
specification 
non-deterministic choice 
conjunction 

sequential composition 
update of unary predicate 
update of binary predicate 


update of inverse of binary predicate 


modification of unary predicate 


modification of binary predicate 


modification of inverse of binary predicate 


Figure 21: Syntax of a Small Imperative Language 
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proc assignClients() = 
spec old(GlobalInvariant) => 
(modify WaitingClients, AssignedClients, 
old(WaitingClients).server :<= Servers, 
Servers.clients :<= old(WaitingClients)) & 
!{WaitingClients} & 
[AssignedClients <=> 
old(AssignedClients | WaitingClients)] & 
GlobalInvariant 


proc assignOneClient(cl) = 
spec old(GlobalInvariant) & 
[cl => old(WaitingClients)] => 
(modify WaitingClients, AssignedClients, 
cl.server :<= Servers, 
Servers.clients :<= cl) & 
(WaitingClients | cl <=> old(WaitingClients)] & 
[AssignedClients <=> old(AssignedClients) | cl] & 
GlobalInvariant 


Figure 24: 
assignOneClient 
tions. 


Specifications for assignClients and 
extended with side effect specifica- 


occurred (the sequential composition operator ensures that 
the Error value is propagated). 

The statement spec fF’, allows describing relations on 
states directly in terms of an extended RL? formula Fr. For- 
mula Fg allows assignment statements and modifies state- 
ments in addition to the constructs of RL?. The relation 
symbols of RL? may refer to relation symbols of the ex- 
tended language, which allows stating relations between pre 
and postcondition. We also allow non-recursive procedure 
calls in the specification when they expand to constructs not 
containing sequential composition. 


modify specifications. The construct 


modify €1,...,€n 
is useful for specifying frame conditions. Each expression e; 
specifies a set of possible modifications. Any finite number of 
modifications can occur as the result of the action specified 
by the modify specification. 


Example 8 Figure 24 shows the _ specifications 
assignClients and assignOneClient from Figure 3 
extended with frame-condition specifications. The frame 
condition for assignOneClient specifies that only the 
sets WaitingClients and AssignedClients can change, which 
is useful if the system contains some additional set of 
objects, such as a set ProcessedClients. Next, the frame- 
condition specifies that the only binary relations that were 
modified are server and clients. The modifies expression 
(Servers.clients :<= cl) indicates that the the only way in 
which the clients relation is changed is by introducing an 
edge from a Servers object to the cl object, or by removing 
an edge from a Servers object. (The removal of the edge does 
not, in fact, occur in assignOneClientIMPL in Figure 3, 
but the frame condition is a conservative approximation.) 
The amount of detail in specifications such as modifies 
clauses depends on how strong property we need to prove. 
The strength of the property, in turn, depends either on 
some high-level program correctness requirement, or on 
the amount of information we need about the procedure 


15 


to prove the properties of its callers. In Figure 3, we 
did not use modify specification for assignOneClient 
because we did not need it to prove the conformance 
of assignClientsIMPL with respect to assignClients. 
However, even in Figure 3 we needed to know that, for 
example, getServer preserves the global invariant, which 
follows from the fact that it does not modify any sets or 
relations (the conjunction with skip implies that getServer 
is a pure function). 


4 


In general, there are three forms of modification expres- 
sions. The expression A :<= F' specifies modifications that 
remove an element from the set A or insert into A an element 
that satisfies F’. For example, after executing the statement 


modify A :<= F 


the set A may contain any subset of the set of objects given 
by the expression AV F’. The expression F\.f :<= F2 spec- 
ifies modifications that 1) remove a tuple (01,02) from the 
relation interpreting the predicate f, when 0; satisfies F\, 
or 2) insert a tuple (01,02) into the relation interpreting 
f, when 0; satisfies Fi and (01,02) satisfies F2. Similarly, 
Fi.~f :<= F» allows removing (01, 02) from the interpreta- 
tion of f when 0; satisfies F\, or inserting (01,02) when 02 
satisfies Fy and (01,02) satisfy ~F2. 

If r; is the relation describing a modification given by the 
expression e;, then the meaning of modify e1,..., én is given 
by the relation 

(4) 


where r* denotes the transitive closure of relation r. The 
simple semantics (4) provides good intuition about the 
meaning of modify statement and makes it clear that the 
modify statement is idempotent [44]. Figure 23 presents an 
alternative semantics, which directly encodes a modify state- 
ment as an RL? formula. The advantage of the semantics in 
Figure 23 is that it eliminates the need for transitive closure 
of the transition relation. 


(71 U...UTn)* 


Disjunction and conjunction. The language allows com- 
puting disjunction and conjunction on statements. Disjunc- 
tion V has a natural interpretation as a non-deterministic 
choice of commands. Conjunction A is useful for combining 
nondeterministic statements. Logical operations on state- 
ments translate directly to the corresponding logical opera- 
tions on RL? formulas. 


Computing sequential composition. When encoding 
sequential composition of statements in RL?, we introduce 
copies L(;) of predicate names in L for i € {1,2,...}. These 
copies of predicate names denote the values of predicates 
at program points between the initial and the final pro- 
gram state. Because the definition of relation composition 
ry ore = {(x,z) | dy. (w,y) € r1 A (y, 2) € r2} involves exis- 
tential quantification over y, we treat the newly introduced 
predicates as being existentially quantified. The technique 
of introducing new predicate names allows us to precisely 
compute relation composition even for non-deterministic 
commands. 


Procedure calls. The meaning of a procedure is also a 
relation on states, where the initial state is extended with 
one unary predicate symbol for each parameter name. In 
the simple translation of Figure 22, a procedure call identi- 
fies parameters with the sets that describe their values by 


M|modify e1,...,en] = 
let {e1,...,en} = 
{Ai <= Fy,..., Ak <= Fy, 


Fruii-froi <= Greqi,; a 
Fraae~fi4i <= Gi4i, fo 


in 


FEL frtis--ofm } fi=f 3) 
i<l 1<i 
A aka BAG. 
FEffetis fm} fiat 
i<l 


A A 


., Fifi <= Gi, 
.;Fin.~fm <= Gm} 


Ai A~Gi)) = fl 
fi=f 
<i 


Figure 23: Semantics of modify statement. 


performing the substitution. Substitution suffices to give se- 
mantics to procedures because we assume that the recursion 
is split using refinement claims. Loops are represented as re- 
cursive procedures, so we effectively require loop invariants. 


Refinement claims. If P,; and P2 are procedure names, 
the refinement claim P,; = Pp» is a proof obligation that 
the relation given by the body of procedure P; is contained 
in the relation given by the body of P2. The intended use 
of the refinement claim is the specification procedure sum- 
maries, which allows breaking the cycles in the call graphs 
of mutually recursive procedures. Figure 22 shows how each 
refinement claim reduces to a test whether an RL? formula 
is satisfiable. When generating the RL? formula, we rename 
the parameters of P2 replacing them with the corresponding 
parameters of P,. 

To ensure that the satisfiability test treats newly intro- 
duced predicates as existentially quantified, we impose a re- 
striction that the translation [52] contains no newly intro- 
duced predicates from L(;) for i € {1,2,...}. We impose this 
restriction because [$2] appears under negation in the sat- 
isfiability test, so newly introduced predicates in [S2] would 
be universally quantified, thus violating the semantics of se- 
quential composition for non-deterministic statements. The 
restriction on S»2 is satisfied when S2 contains no sequential 
composition, which is typically the case for a large class of 
procedure summaries. 

By providing sufficiently many procedure summaries, the 
partial correctness of a program is reduced to a finite number 
of refinement claims. By discharging these claims using a 
decision procedure for RL”, we decide the partial correctness 
of the program. 

Fixpoint computation. If some procedure summaries 
are not supplied by the programmer, they can be inferred 
using fixpoint computation. An algorithm for fixpoint com- 
putation can be derived from the fixpoint semantics of 
mutually recursive procedures using abstract interpretation 
[19, 21, 20, 70]. A special case of this approach is to select a 
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n= {O}ICAGARY | AAR IOP 
n= A|CiAC2|AC 
fl of | Riv Re 


— atomic unary predicate 


Sh wmay 
ii 


— atomic binary predicate 


Figure 25: Boolean Shape Analysis Constraints expressed 
as a sublogic of RL? 


finite subset of all RL? formulas and define a lattice structure 
on the set using the entailment of formulas. A simple way 
to define a finite subset of formulas is to consider only RL? 
formulas with quantifier depth at most k, for some k > 1. 
Boolean shape analysis constraints in Section 5.2 have quan- 
tifier depth at most two, so they can be used as a basis of 
fixpoint computation. 


5.2 Describing Boolean Shape Analysis Con- 


straints 


Boolean Shape Analysis Constraints [48] are a natural lan- 
guage for describing dataflow facts of shape analyses [65]. 

Figure 25 presents the syntax of Boolean Shape Analy- 
sis Constraints as a subset of role logic. This presentation 
of Boolean Shape Analysis Constraints shows that they are 
a subset of the decidable fragment RL? of role logic. In 
fact, Boolean Shape Analysis Constraints do not use count- 
ing quantifiers, so they are already expressible in the two- 
variable predicate logic L? (without counting). 


A note on usability of role logic. An anecdotal evi- 
dence of the usability of role logic is the fact that all results 


n= A|CNC|AC|>nkR.C 
f|ROR|AR|U|R| Rio |id(C) 


— atomic unary predicate 


~~ BB DQ 
ii 


— atomic binary predicate 


Figure 26: An Expressive Description Logic 


[A] = A 
[Ci NC] = [Ci] A [C2] 
Lc] = -Ic] 
[[nR.C] = card="([R] A [C]) 
If] = f 
[Ri nM Re = [Ri] A [Re] 
-R] = -1] 
[U] = true 
[R"] = 1 
[Ric] = [RAIC] 
[id(C)] = EQA[C] 


Figure 27: Translation of an Expressive Description Logic 
to Role Logic with Two Variables 


of [48] were initially shown using role logic notation and then 
translated into the standard first-order logic notation. We 
have found the variable-free aspect of role logic convenient 
when showing the results of [48]. We have subsequently dis- 
covered the connection of role logic with C? [30], presented 
in Section 4, and the connection with description logics [1], 
presented in Section 5.3. 


5.3 Encoding an Expressive Descriptive Logic 


Figure 26 presents an Expressive Description Logic fragment 
where roles have no transitive operators {1, Chapter 5]. Fig- 
ure 27 presents the translation of the Expressive Description 
Logic into RL?. The translation maps the concepts C’ and 
roles R of description logic into unary and binary predicates 
of role logic. The translation to RL? in Figure 27 implies that 
the description logic in Figure 26 is decidable. The fact that 
interesting description logics can be translated to RL? is not 
surprising once we have established that RL? and C? have 
equal expressive power. Nevertheless, it is interesting to ob- 
serve the simplicity of the translation from the description 
logic to RL?, which is partly because both description logic 
and role logic avoid explicit occurrences of variables. 
Using rules 


[Rio Ra] = 
[R*] = 


[Ri] o [Ra] 
[A] 


we can translate operations on binary relations into the full 
role logic, but not into the decidable fragment RL?. Decid- 
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ability of interesting description logics that contain transi- 
tive closure but do not have tree model property is an open 
problem [1, Page 214]. 


A note on terminology. The term “role” has different 
meanings in different formalisms for describing structures. 
In [43], a role corresponds to a unary predicate (set), in de- 
scription logics [1], a role corresponds to a binary predicate 
(relation), and in entity-relationship diagrams in databases 
[16], a role corresponds to a position 7 (1 < i < n) ina 
n-tuples of an n-ary relation. To avoid the confusion, we 
use the well-established terms of n-ary “predicate” (or “re- 
lation”), keep the name “role logic” for the logic described 
in Figure 9, because the term “role logic” appears appro- 
priate regardless of the particular interpretation of the word 
“role”. 


Description Logics Corresponding to C*®. | The re- 
sult [10, Theorem 4] reports that the description logic with- 
out transitive closure and relation composition (denoted 
DL —{trans, compose}) corresponds precisely to C?. The 
results of Section 4 and [10] imply that our logic RL? has 
the same expressive power as D£—{trans, compose}. One 
of the differences between RL? and D£—{trans, compose} 
is that RL? contains the prime operator F’ and does not 
contain the product operation of DC—{trans, compose}. 
Another difference is the foundation of role logic on de 
Bruijn lambda calculus notation, as described in Section 3. 


6 Related Work 


We have initially developed role logic to provide a founda- 
tion for role analysis [43, 42]. We have subsequently stud- 
ied a simplification of role analysis constraints and showed 
a characterization of such constraints using formulas [46]. 
Parametric analysis based on three-valued logic was intro- 
duced in [64, 65] with interprocedural analysis in [61] and 
application to abstract data type verification in [52]. A char- 
acterization of dataflow facts used for shape analysis was 
presented in [71, 48]. A decidable logic for expressing con- 
nectivity properties of the heap was presented in [7]. 

Specifying the semantics of programs using predicates 
dates back to axiomatic program semantics [32, 24]. An 
approach that uses a first-order logic theorem prover tailed 
for program verification is [23]. 

Like [40, 39, 37, 55], in Section 5.1 we use an expres- 
sive yet decidable logic to encode fragments of straight-line 
code. Our approach differs primarily in using logic RL? over 
general graphs whose decidability follows from the decid- 
ability of C?, where [40, 39, 37, 55] uses graph types whose 
decidability follows from the decidability of monadic second- 
order logic over trees. We expect that these two logics can 
be combined in a fruitful way. 

We have extended our language with constructs that 
make it possible to directly express higher-level state trans- 
formations, which is the idea related to the chemical reac- 
tion model of [26, 27], the verification of database transac- 
tions [6], the simultaneous assignments of [55], and in wide- 
spectrum languages [56, 3]. Verification of a form of mod- 
ifies clauses using a theorem prover was presented [50, 44]. 
Further approaches to pointer and shape analysis include 
[17, 68, 15, 29, 25, 28, 69]. 


‘Note added on 31 October 2003, after becoming aware of [10]. 


Description logics [1, 9] share many of the properties of 
role logic and have been traditionally applied to knowledge 
bases. It is likely that description logics can be used for 
shape analysis as well. It would be particularly interesting to 
consider description logics with transitive operators, whose 
decidability is related to the decidability of dynamic logic 
[31]. Reasoning about the satisfiability of expressive de- 
scription logics over all structures and over finite structures 
is presented in [13, 14]. Reasoning about entity-relationship 
diagrams [16] is presented in [51]. Some connections between 
object models and heap invariants are presented in [45, 35]. 

Like the Alloy modelling language [36], role logic com- 
bines the notation of predicate calculus with the notation of 
relational algebras. It may be possible to combine the nota- 
tion of Alloy with the notation of role logic, and to combine 
the benefits of bounded model checking used in Alloy Ana- 
lyzer with the benefits of a decision procedure for RL?. 

A recent approach to reasoning about mutable impera- 
tive data structure is separation logic [34, 59, 60, 12, 11]. We 
are currently working on integrating some aspects of spatial 
logic to support more flexible notation for records in role 
logic. 

Interactive theorem provers have also been used for rea- 
soning about dynamically allocated data structures [54, 2]; 
it may be interesting to incorporate a decision procedure for 
RL? into these general tools. 


7 Conclusions 


We believe that role logic notation is a convenient way of 
expressing properties of first-order structures. First-order 
structures are a natural way to model the state in object- 
oriented programs, or a the state of a knowledge base or 
a database. Role logic can be combined with traditional 
variable-based notation in a natural way. Furthermore, in- 
teresting subsets of role logic are decidable. Decision pro- 
cedures for role logic can therefore enable shape analysis of 
programs and have similar benefits as description logics in 
knowledge bases. 
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